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ABSTRACT 

There is enormous potential to advance cosmology from statistical 
characterizations of cosmic microwave background sky maps. The angular power 
spectrum of the microwave anisotropy is a particularly important statistic. 
Existing algorithms for computing the angular power spectrum of a pixelized 
map typically require 0{N^) operations and 0{N'^) storage, where N is the 
number of independent pixels in the map. The MAP and Planck satellites will 
produce megapixel maps of the cosmic microwave background temperature at 
multiple frequencies; thus, existing algorithms are not computationally feasible. 
In this article, we introduce an algorithm that requires 0{N'^) operations and 
0(A^3/2) storage that can find the minimum variance power spectrum from sky 
map data roughly one million times faster than was previously possible. This 
makes feasible an analysis that was hitherto intractable. 



1. Introduction 



The MAP and Planck satellites have the potential to yield enormous amounts of 
information about the physical conditions in the early universe (Bennett et al. 1995; 
[http:// map . gsf c . nasa . go"v| ; Bersanelli et al. 1996; |http: / / astro. estec.esa.nl/Planck|) . For 
example, with full sky coverage and an angular resolution of 0.21° at its highest frequency, 
MAP should return accurate measurements of the cosmic microwave background (CMB) 
temperature anisotropy at roughly one million independent points on the sky. If the 
temperature fluctuations are consistent with inflationary models, these measurements will 
provide accurate determinations of most of the significant cosmological parameters (Spergel 
1994; Knox 1995; Hinshaw, Bennett & Kogut 1995; Jungman et al. 1996; Zaldarriaga, 
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Spergel & Seljak 1997; Bond, Efstathiou & Tegmark 1997). If they are not consistent with 
inflationary models, it will be even more exciting as we will need to rethink our ideas about 
the physics of the early universe. 

There are a number of non-trivial numerical steps involved in comparing 10^^ 
temperature differences with the predictions of a particular cosmological model. Several 
of the steps in the calculation are now clear. Wright, Hinshaw & Bennett (1996) found a 
rapid and exact algorithm for producing megapixel cosmic microwave background maps 
from differential data. Seljak & Zaldarriaga (1996) present a fast algorithm for computing 
the power spectrum for a given cosmological model. However, to date, we have lacked an 
efficient technique for computing the power spectrum from an observed sky map. The 
techniques that were used to extract the power spectrum from the COBE data (Gorski 
1994; Gorski et al. 1994; Bond 1995; Tegmark & Bunn 1995; Gorski et al. 1996; Hinshaw 
et al. 1996; Tegmark 1996; Bunn & White 1997) cannot easily be extended to the high 
resolution data since these techniques require 0{N'^) operations and 0{N'^) storage. For a 
single frequency of MAP data, for example, this would take > 10^^ operations, with > 10^^ 
operations required for a combined analysis of polarization and temperature maps. All 
proposed techniques to date are thus wholly unequal to the task of analysing upcoming 
data sets (Borrill 1997). 

In this paper, we present a fast, accurate method for extracting the power spectrum 
from realistic simulations of high resolution data at a single frequency. In §2-5 we develop 
our numerical method for determining the power spectrum from data containing only 
cosmological signal and detector noise. In §6, we show how the same approach can be 
used to directly estimate cosmological parameters from the maps. In §7, we apply the 
method to realistic simulations of MAP data that include spatially varying noise and a 
galactic sky cut. We show that our numerical method recovers minimum variance, unbiased 
estimates of the power spectrum and of cosmological parameters. We summarize our results 
in §8. A subsequent paper will extend the techniques developed here to the analysis of 
multi-frequency data and polarization data. 

2. McLximum Likelihood Determination of the Power Spectrum 

The basic problem is to extract the angular power spectrum of the CMB temperature 
fluctuations from noisy data. We can expand the CMB temperature in spherical harmonics. 



l,m 
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where wi is Legendre expansion of the experimental window function. If inflationary models 
are correct, the af^^ coefficients are Tincorrclatcd Gaussian random variables with zero mean 
and a variance that is independent of orientation 



where ^ is the angular power spectrum we wish to estimate from the maps. 

Like COBE, MAP and Planck will produce full sky maps at a number of frequencies. In 
this paper, we will focus on the analysis of the highest resolution map from MAP (0.21° at 
94 GHz), which is expected to contain useful signal up to at least multipole order I ~ 1000. 
The data in a given map may be expressed as a superposition of several terms 



where 6t is the CMB signal convolved with the instrument's beam response, n is the 
detector noise, g is the systematic error, and f is the foreground emission. Note that the 
measurement vector m can be expressed either in a pixel basis, with components mj, 
the observed temperature in pixel i, or in a spherical harmonic basis, with components 
'fnim — 0,1m, the observed moment of the spherical harmonic Yi^. The choice of basis to 
use for power spectrum estimation requires weighing many trade-offs which we discuss in 
this and subsequent sections. Key among these is the form of the covariance matrix that 
describes the data, m. 

While MAP will measure temperature fluctuations across the entire sky, we will want 
to exclude pixels in the Galactic plane from our analysis, as well as other randomly located 
pixels that contain bright, non-cosmological sources. In this paper, we will assume that we 
can remove foreground emission by simply excluding the galactic plane and bright sources. 
The resulting incomplete coverage of the celestial sphere precludes naively evaluating a 
spherical harmonic expansion of the data by direct integration. The analysis is further 
complicated by AMP's inhomogeneous sampling of the sky: the noise per unit area will vary 
by more than a factor two between the ecliptic poles and the equator. MAP^s differential 
design has been optimized to minimize systematic errors and to produce maps with spatially 
uncorrelated noise. We will assume that we can ignore systematic errors and that the noise 
is uncorrelated from pixel to pixel. We have not yet tested the performance of our algorithm 
with maps that that contain significant pixel to pixel noise correlations or "stripes" . 

With the above assumptions, the covariance matrix of the data takes the form 
C = (mm^) = S + N where S = {StSt^) is the covariance of the CMB fluctuations, and 
N = (nn-^) is the covariance matrix of the noise. In the pixel basis, the noise matrix is 
diagonal (Nj^ = CTi^ij, where Uj is the rms noise in pixel i) while in the spherical harmonic 
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m = 5t-|-n-|-g-|-f 



(3) 
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basis the signal matrix is diagonal 

S(lm){lmy = (a'^afZ*) WlWv = cf^ wf 6(lm)(lmy . (4) 

For the sake of notational simplicity, we shall henceforth set 

Q = (5) 

and solve for q. Note that the signal matrix in the pixel basis can be easily expressed in 
terms of spherical harmonics 

S,, = {6U6t,) = YiU^,)S^i^)^i^yY;^,{n,) = YSY* (6) 

{lm),{lmy 

where Y* is the complex conjugate of Y. The form of the noise matrix in the spherical 
harmonic basis is deferred to the next section. 

We wish to estimate the angular power spectrum from the data by explicitly maximizing 
the likelihood function, which, for Gaussian fluctuations, has the form 

exp ( — im^C^^mj 
^(^'""^^ = (2vr)^/^(detC)V^ 

where C is the full covariance matrix. A number of authors (Gorski 1994; Gorski et al. 
1994; Bond 1995; Tegmark & Bunn 1995; Gorski et al. 1996; Hinshaw et al. 1996; Tegmark 
1996; Bunn & White 1997) have used this approach to extract the power spectrum from 
the COBE data. 

We find the most likely power spectrum by expanding the likelihood function as a 
Taylor series 

1 d'f 



I 9ci 



1,1' 2 dcidc, 



[Ci - Ci) [Ci> - Q/ 



where the over-bar indicates the values at which / is minimized. Differentiating equation 
(0) yields 

^ = -(^]2\nC= -m^C-^P'C-^m + tr(C-^P') (9) 
dci \dcij 

where 

P' ^ (10) 

OCi 

In the spherical harmonic basis P' is a diagonal matrix with entries or 1, while in the 
pixel basis, it is given by 

P\,= ^^^Pi{cos^.,) (11) 
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where Pi is the Legendre polynomial of order /, and 7,^ is the angle between pixels i and j. 
The second order term in the likelihood function is 

= m^C-ip'C-ip''C-im - itr(C-ip'C-ipO. (12) 
2dcidci> 2 ^ ' ^ ' 

The expectation value of this quantity is the Fisher matrix 

where we have used (mm-'") = C. 

We can locate the most-likely power spectrum by finding where the derivative of the 
likelihood function is zero. Let cf*^ be a trial power spectrum, then the derivative of the 
likelihood in the neighborhood of cf^ may be written 



dci 



dci 



..(0) 



+ 2j2Fu'{cu-4?^)=0 (14) 



where we have approximated the second derivative by its expectation value. This suggests 
solving for the most-likely power spectrum using a variable metric method, 



(n+l) _ (n) 1 >^ ^-1 df 



(15) 

(n) 



Note that this is exactly the Newton-Raphson method for non-linear systems of equations 
(Press et al. 1992), which is well known to converge quadratically near the neighborhood 
of a root. Its only potential problem is its poor global convergence properties, which can 
occur when the approximation in equation (|^) is not valid. In practice, we have found 
the likelihood function is sufficiently well-behaved that this is never a problem, even with 
extremely poor starting guesses. The structureless nature of the hkelihood function has 
also been noted by other authors (Bond, Jaffe & Knox 1998). This is easy to understand 
intuitively. As we demonstrate later the q parameters are only weakly correlated with each 
other as the Fisher matrix is diagonally dominant. Thus, to first order, we are performing 
a series of one-parameter likelihood maximizations, with small corrections for couplings. 
In each dimension, the likelihood is very well approximated by a parabola (since the 
probability distribution for each c; is close to Gaussian). In particular, no local extrema 
exist, so there is no danger in employing Newton-Raphson. This also explains the extremely 
rapid convergence of the method - typically 3-4 iterations. 

Our equations are essentially equivalent to those of Bond, Jaffe & Knox (1998) 
(in addition, Tegmark (1997) obtained equivalent results by considering a quadratic 
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optimisation problem). However, we have recast them into a form that is computationally 
more tractable. The key advance is in reducing all ostensibly 0{N^) operations to 0{N'^) 
operations. We highlight the crucial elements of this improvement below, and discuss them 
more fully in subsequent sections. 

• C"^m - Rather than evaluating this expression directly using Choleski techniques, 
we solve it iteratively using conjugate gradient techniques (Press et al. 1992, Barrett et 
al. 1994), which are applicable to symmetric, positive definite linear systems. A good 
approximate inverse or preconditioner, C~^, is essential to reducing the number of iterations 
required. We find such a preconditioner by exploiting the approximate azimuthal symmetry 
of the noise pattern on the sky. This approach requires O((A^/20)'^/^) memory for storage 
and takes O((A^/20)^) operations to compute. In addition, each iteration of the conjugate 
gradient method involves performing a matrix multiplication of the form Cz where z is a 
vector. We speed this up by writing C as a convolution of diagonal matrices and spherical 
harmonic transforms. By employing fast spherical harmonic transforms (Muciaccia, Natoli 
& Vittorio 1998; DriscoU & Healey 1994), we are able to reduce the cost of the matrix 
multiphcation from O^N"^) to O^N^/"^). 

• tr(C^^P') - We first compute this term approximately by assuming azimuthal 
symmetry of the noise. We then compute it exactly using Monte Carlo simulations of maps 
and exploiting the fact that 



where we have used (mm-'") = C. The errors obtained by computing the trace term 
with Monte Carlo simulations rather than exactly are ^1 + 1/Nmc times larger than the 
minimum variance errors, where Nmc is the number of simulations used to compute the 
trace. Thus, if we generate 100 simulations, our errors will be only 0.5% larger than the 
minimum variance errors. The total cost of this step is O^NmcNuerN^^'^) , where N^^r is the 
number of iterations used to evaluate equation ([T3|). 

• F - Note that we only require an approximate second derivative to converge to 
the maximum of the likelihood function - the solution is fully independent of F. We 
therefore compute F approximately using the preconditioner previously computed, 
which requires O((iV/20)^) operations. Once we obtain the maximum likelihood q, we use 
Monte Carlo simulations to obtain their probability distribution and hence their errors. 




(16) 
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3. Iterative Evaluation of the Term C 

A key step in our analysis is to iteratively (and rapidly) solve the linear equation 

Cz = m (17) 

using conjugate gradient techniques. Note that this solution is only used in the evaluation 
of terms of the form m-^C^^P'C^^m. Thus, we have complete freedom to obtain this 
solution in pixel space (where the data vector is simply the temperature map) or in spherical 
harmonic space (where the data vector is the least squares fit of spherical harmonic 
coefficients to the temperature map). In pixel space, we can use the addition theorem for 
spherical harmonics to write 

I 

m^C-^P'C-^m = Yl m^C-^Y;„Y,^C-^m (18) 

m=—l 

which involves taking spherical harmonic transforms of the filtered, cut map, C~^m. In 
spherical harmonic space, P' is simply a diagonal matrix with ones or zeros along the 
diagonal, so the evaluation of m^C~^P^C~^m is even simpler. 

Our choice of the appropriate space to work in is motivated by a second consideration: 
the conjugate gradient technique requires that we compute an appropriate preconditioner 
matrix. Given a linear system Ax = b, where A is symmetric and positive definite, the 
preconditioner is a symmetric positive definite matrix A, such that A~^A = I + R, where 
the eigenvalues of R are all less than 1. The preconditioned conjugate gradient technique 
then solves the system 

A-^Ax = A-^b (19) 

by generating a series of search directions and improved iterates. Specifically, it generates 
a sequence of coupled recurrence relations for the residual vector r^*) = b — Ax^*-* and the 
search direction p*^*) 

r« = r(^-i) - a^ApW (20) 
= A-^r(^-^) + A_ip(^-^) (21) 



The scalar ai is chosen to minimize the quadratic function /(x) = (x*^*'' — x)"^A(x*^*'' — x), 
where x is the exact solution to Ax = b 



pWAp^' 



ai = ■ (22) 
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The scalar is chosen to ensure that the residuals are A ^ orthogonal (i.e., r^*-* A ^r'--'^ = 
for i 7^ j) 

In this manner the quadratic function /(x) of the improved iterate 

= + aip(^) (24) 

is minimized over the whole vector space of conjugate search directions already taken, 
{pi, p2, ...}. The routine is initialized by setting r^°) = b — Ax'-^-' for some initial guess x*^°\ 
and setting p^^^^ = A~^x*^°\ In general, the number of search directions required to span 
the vector space of possible solutions is A^; the preconditioner reduces this by transforming 
the contours of /(x) to be as spherical as possible. To the extent that this is achieved the 
number of independent directions to minimize over becomes very small and the conjugate 
gradient routine converges quadratically. More precisely, the number of iterations required 
is proportional to \/k2' where K2 is the condition number of the matrix A~^A (i.e., the 
ratio of its largest to smallest eigenvalue). Further details may be found in Barrett et al. 
(1994) and Press et al. (1992). 

There are two conflicting requirements for a preconditioner: it must be a sufficiently 
good approximation to the true inverse that K2 is not too large, yet it must be sufficiently 
sparse that it is significantly easier to compute and store than the original matrix. From 
this comes our choice of the correct space to work in: at low / where the signal dominates, 
we want to work in spherical harmonic space (where S is diagonal), while at high /, where 
the noise dominates, we want to work in pixel space (where N is diagonal). 

Thus, we begin by considering how to transform the system into spherical harmonic 
space. We can obtain the best-fit multipole moments, mim = wi m1^, by minimizing 

2 

nii - J2 ^im yimi^i) (25) 

Lm 



where the sum is over the uncut pixels, 0", is the rms noise in the ith pixel, rrii is the 
observed temperature of the ith pixel, Vti is the direction to the center of the ith pixel, 
and Wi is the Legendre transform of the experimental window function. By differentiating 
equation (|25|) , we derive the normal equations (Press et al. 1992) 

N^^m = y (26) 

where 

-ivT- 1 _ \ ^ yimiS^i) ^rm'(^t) (07^ 

[lm.){lm.)' ~ 2^ 2 \^') 
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is the inverse of the noise matrix for the multipole moments, and 

y(M = 2^ — -2 — 

i i 

is the variance weighted spherical harmonic transform of the temperature map. The 
solution of the set of normal equations is a minimum variance estimate of m(/„) and the 
uncertainty in each multipole is determined by N(im)iimY- 

The normal equations are ill-conditioned because N^^ has null vectors induced by the 
cut pixels. Fortunately, we do not need to compute m, but rather z = (S + N)~^m. We 
can do this by solving the linear system 

(S + N)z = m = Ny (29) 



We can put this into a computationally more tractable form by multiplying both sides by 
gi/2p^-i^ giving 

(l + S'/^N-'S'/') S^/^z = S^/V (30) 

The reason for putting the system in this form is twofold. First, it only involves N^^, not 
N, which cannot be easily evaluated. Second, the matrix A = I + S^'^^^N^-'^S^/^ (which we 
will apply the conjugate gradient technique to) is well conditioned in the sense that its 
eigenvalues only span a few orders of magnitude. The second term - which is explicitly the 
signal to noise ratio - is regularized by the presence of the identity matrix. Alternative 
forms such as (N~^ + S~^) Sz = y do not share this property - these matrices have 
eigenvalues that span a wide dynamic range. 

We obtain a good preconditioner by splitting the matrix into two parts 

V I + diag(Si/2N-iS"2) ) ^ ' 

where is a sparse, approximate form of N~^, given below. The lower right hand block 
is used for large I where the noise dominates the signal, so that S^^^N^^S^^^ is small. In 
this regime, we obtain z y, the spherical harmonic transform of the inverse variance 
weighted map. At small / the signal dominates, so we need to be able to approximate the 
form of the full noise matrix. For MAP 2-year data, we find that the above preconditioner 
works well if we split the matrix at /cutoff = 512. 

We now make a brief detour to explore the structure of the inverse noise matrix by 

expanding it in terms of the Wigner 3-j symbols. This gives us both a computationally 
efficient method to evaluate N~^, and, more importantly, provides some insight into the 
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structure of N ^ due to the sky cut and noise pattern. We first expand the inverse variance 
(weight) map in spherical harmonics. The muhipole moments of this map are 

w,„ = E^^^- (32) 

Using the completeness of the spherical harmonics, we may expand the inverse noise matrix, 
equation (P7|), as 

{Im)" i 

^ f {21 + 1){21' + l){2r' + l) y^^ f I V I" 

in)" V y 1^ y V ^ ^' ^" 

where the terms in brackets are the 3-j symbols. The first symbol is only non-zero 
when |/ — /'I < I" < |/ + /'|. The second symbol imposes the additional constraint that 
m — m' + m" = 0. We use numerically stable recurrence relations (Schulten, Klaus & 
Gordon, 1975) to compute the symbols. Alternatively, N^^ may be computed by direct 
summation, which also requires 0{N^^'^) operations. 

This expansion suggests a simple approximation to the weight matrix. The dominant 
feature in the weight map is the galactic sky cut which is, to first order, azimuthally 
symmetric in galactic coordinates. In the limit of pure azimuthal symmetry the weight 
matrix would be block diagonal (proportional to Smm')- This suggests a preconditioner of 
the form 

^{lL){lmy = ^{lL){lmy^irim'- (34) 

Since (I + 8^/2^-181/2) jg block diagonal, we can compute its inverse in 0{N^) steps. It IS 
actually significantly less than this, due to the decreasing size of the block matrices. The 
matrix A"^ requires the largest amount of memory for storage, 0{{N/20)^^'^), where the 
savings in the pref actor result from the facts that 1) we only need the full matrix up to 
/ = 512, 2) all of the block matrices are symmetric, 3) the blocks are of decreasing size 
as m increases, and 4) parity is preserved - the covariance between even and odd I terms 
vanishes. For a 2 million pixel map, only 100 MB of memory is required to store the 
preconditioner in double precision. 

The block diagonal approximation is an excellent ansatz to which we need only apply 
small perturbative corrections. Why does it work so well? To answer this, we must 
understand the sparsity pattern of N^^ or, equivalently, its diagnostic W(;m)- The MAP 
weight map is approximately axisymmetric in ecliptic coordinates. Rotation to galactic 
coordinates (which imposes a tilt of about 60°) introduces a smooth, azimuthal variation 
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in the noise pattern that can be perturbatively expanded in a Fourier series e*'"'^. The 
approximation of galactic axisymmetry used in the preconditioner is the largest (m = 0) 
term in such an expansion. We can quantify the fall-off in ^{im) with increasing m by 
computing the "power" at each m, defined as 

Cm ^ -r . E M- (35) 

Figure |I| shows that corrections to the m = mode are at most a few percent. The only 
high frequency contribution to the weight map comes from point sources. Since they only 
occupy ~ 5% of the sky, their effect is small. 

A note on overall computational cost: each iteration of the conjugate gradient routine 
requires forming the products Awi and A~^W2 for some work vectors wi and W2. The 
former involves 2 products with diagonal matrices (S^''^) and a forward and inverse spherical 
harmonic transform, each 0{N^/'^). The latter involves a matrix- vector multiplication where 
A~^ has fewer than 0{N^^'^) non-zero elements. Thus, the cost per iteration is 0{N^^^), 
and the total cost of solving the linear system is 0{NiterN^^'^)- A good preconditioner is 
the key to minimizing Nuer - in its absence, Nuer ~ N. We find that the linear system, 
equation (^), can be solved in about 6 iterations for the specifications appropriate to the 
2-year MAP data and the preconditioner specified above. This requires ~30 seconds to 
solve for a 500,000 pixel map, and ~ 250 seconds to solve for a 2 million pixel map, running 
as a single processor job on an SGI Origin 2000. 

Note that the preconditioner we have described is optimized for the MAP experiment. 
The choice of the appropriate preconditioner is likely to vary from experiment to experiment, 
in particular due to the different weight matrix of each experiment, depending on its 
survey geometry and noise properties. Since the normal equations for the time-ordered 
data are solved in the map-making process, the weight-matrix already arises naturally 
in the previous step of the pipeline (Wright 1996). Its sparsity properties are therefore 
fairly well understood already at this point, and may be exploited in the construction 
of the preconditioner. This is another advantage of our approach, compared to previous 
techniques cast in terms of N. Note that our use of fast transforms to reduce the cost of 
matrix multiplies to 0{N^^'^) is fairly general. In the complete absence of any intuition 
about symmetries in the noise and geometry of the experiment, one can still use the fact 
that almost all the matrices we deal with in CMB experiments are diagonally dominant 
(the correlation function of both the signal and noise fall rapidly with separation in 
any experiment, and thus off-diagonal elements die rapidly). Thus, one possibility is to 
approximate a matrix by its diagonal. More general techniques for sparse symmetric 
positive definite matrices exist, e.g. Cholesky multifrontal methods (Liu 1989), which can 
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be used to build preconditioners. 



4. Computation of the Trace 



While we can use the preconditioner to compute m^C^^P'C~^m rapidly, it is still 
very time consuming to evaluate tr(C~^P') for each value of I by iteratively solving a linear 
equation for each (Im) term in the trace. There are two approaches to avoiding a brute 
force evaluation of the full trace. 1) Approximate tr(C~^P') as tr(S~-^/^A~^S^/^N~^P') 
using the fact that C^^ = S~"^/^A~^S^/^N~^. This returns a high quality estimate of q. 2) 
Evaluate the trace for a given c; using Monte Carlo simulations of maps drawn from that 
spectrum. This approach exploits the fact that 



where we have used (mm-^) = C. This is the most computationally expensive part of 
our algorithm: it requires O {NmcNuerN^^'^) operations. Therefore we first maximize the 
likelihood function using the approximate trace (method 1), only switching to the Monte 
Carlo evaluation of the trace once the former solution has converged. Since we are very 
close to the true answer at this point, the Monte Carlo solution converges very quickly. Note 
that the Monte Carlo method is guaranteed by construction to be unbiased. Additionally, 
we can include non-linear effects in the synthetic maps that are not easily modeled in 
the covariance matrix C. This will both correctly alter the maximum likelihood point 
and propagate through to the error estimates. We can further generalize the method to 
incorporate all known effects in our analysis by simulating the full analysis pipeline, rather 
than just the processed map. 

How many Monte Carlo evaluations of q^*^ = mWc~^P'C^^m'^*\ where i is a 
realization index, are necessary to obtain an accurate determination of tr(C~^P')? To 
answer this question, we need to determine how the variance in q^*^ propagates through 
to variance in the recovered q. Suppose we were able to calculate (q) exactly for a given 
power spectrum q (e.g., by using an infinite number of simulations). The variance in our 
recovered q would then be given by the Cramer-Rao minimum variance bound 




(36) 



.recovered 



c*™")(c' 



.recovered 



c*™r) = ^F-^((q-(q))(q-(q)r)F-^ 



(37) 



which implies 

((q-(q))(q-(q))^) = 4F (38) 
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Now we do not know (q), but (q)', obtained by averaging q*^*) over A^^c uncorrelated 
Monte Carlo simulations. Hence, (q)' has a Gaussian distribution, with variance 



^U' = ]^^a(0 (39) 



' mc 



which implies 



'^recovered ^true^ ^ ^recovered ^true^T^ | ]^ _j_ j JT^l (40) 



where we have added the errors in quadrature. Thus, if we average 100 Monte Carlos to 
evaluate (q)', our errors will be only 0.5% larger than the minimum variance case. 

We have verified these expectations numerically. For a 512 x 1024 pixel test map (note 
that this is smaller than the 1024 x 2048 map we analyze in subsequent sections), we recover 
Q by maximizing the likelihood function, but in each instance we compute (q)' using a 
different number of Monte Carlo simulations. For each solution 4""-^ we then compute 

= (c(^-) - c*™<=)^F(c(^™) - c*™<=) (41) 

which is expected to have a mean of (1 + l/Nmc){lmax ~ !)• For consistency we must use 
the same Fisher matrix for each Nmc, but this only affects the overall normalization of 
Xn^^- Here we have used the Fisher matrix of the true underlying power spectrum. Figure 
shows that to a very good approximation 



Xn„ 



oc 1 + -— . (42) 



\''max ^ / 

Note that the quadratic convergence of the Newton-Raphson method means that the 
number of significant digits is doubled at each iteration. We can match this rate of 
convergence by doubling the number of Monte Carlo simulations used to compute the trace 
at each iteration. This saves some computer time in the early iterations when high precision 
is not required. 



5. The Fisher Matrix and Error Estimates 

As with the trace, a good approximation to the true Fisher matrix is given by 

F;, = ^tr (C-ip'C-ip'') (43) 

where C"^ = S^^/^A^^S^^^N^^. Since A~^ and have already been computed when 
forming the preconditioner, and since P' consists only of zeros and ones along the diagonal. 
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calculating this approximate Fisher matrix only entails the multiplication of block diagonal 
matrices, which is relatively quick. It is important to note that the Newton-Raphson 
method only requires an approximate Fisher matrix to converge to the maximum likelihood 
solution - the final solution is independent of the Fisher matrix. 

Of course any subsequent analysis of the power spectrum, such as cosmological 
parameter fitting, requires accurate error estimates for the q. These are often obtained 
by assuming the are Gaussian distributed and computing their covariance matrix F~^. 
This description is likely to be fine for large / (say / > 32) but for small / we would like 
a better description. This is because the number of independent modes (which scales as 
21 + 1) is small for low /, and only at large / does the Central Limit Theorem guarantee that 
the q's are Gaussian distributed. One could attempt to compute the distribution of the q 
directly from the Monte Garlo simulations, independent of any assumptions of Gaussianity. 
Specifically, for low I we should work with smoothed maps (with a small number of pixels), 
which may be solved in a matter of seconds. For each realization i, one can find the 
maximum likelihood power spectrum cf^ by the same procedure used to solve for the full 
map. By making thousands of realisations, one immediately obtains the full probability 
distribution of the c/'s, including higher order moments, and not just the second moment 
(as in the Fisher matrix formalism). If we sort the recovered cf^'s, we immediately obtain 
(in general asymmetric) confidence intervals. This Monte-Carlo method of computing the 
distribution of errors makes no assumptions about the shape of the likelihood function, 
apart from having an identifiable maximum. Note that each realisation requires fresh 
computations of tr(C~^P') in order to iterate to convergence. It is thus prohibitively 
expensive (and unnecessary) for high I, and we recommend this procedure up to Z = 32. 

For higher order multipoles, we need only compute the Fisher matrix. In principle we 
can use our Monte Carlo results to compute 



Unfortunately this is a factor of NugJmax times more expensive than computing tr(C ^P'), 
which is prohibitive. However, we have already noted that 



Thus, in principle, one can obtain the Fisher matrix from the Monte Carlo simulations for 
free. In practice, we have found this to be a good recipe for computing the (comparatively 
large) diagonal elements, but to be too noisy for use in computing the (comparatively small) 
off-diagonal elements. To overcome this, we use the fact that the shape of the Fisher matrix 
(which also yields the window function for the c;) depends mainly on the geometry of the 




(44) 




(45) 
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weight map and only weakly on other aspects of the map. We therefore extract the shape 



Note that for large /, where the signal-to- noise becomes low due to beam smearing, 
the off-diagonal terms in the Fisher matrix start to become significant. This is easy to 
understand intuitively: the statistics of the q at large / are dominated by noise, which is not 
rotationally invariant (see Appendix A). This induces correlations amongst the multipoles. 
However, for the signal-to-noise ratio of the 2-year MAP data, this effect is unimportant 
(see Figure We find that our results for change by very little if we approximate the 
Fisher matrix as diagonal. 

We emphasize that our final Fisher matrix is extremely accurate and may be used 
in good faith for parameter estimation. The calculation of the diagonal elements is 
exact and may be performed to arbitrary accuracy by increasing the number of Monte 
Carlo simulations. The scaling of the off-diagonal elements is approximate and uses 
the azimuthally averaged noise map. However, the off-diagonal elements make a small 
contribution to the Fisher matrix to begin with, as quantified by the small change in x^- 
The effect of excluding non-azimuthal corrections, which are down by another two orders 
of magnitude (Figure |l|), is therefore utterly negligible. Thus, while it is possible to make 
perturbative corrections to the scaling of the off-diagonal elements, this is unnecessary 
in practice. Another way of saying this is that the dominant source of cross talk among 
multipoles in the Fisher matrix is the galactic cut. Note that the neglect of azimuthal 
noise variations would not be valid for the diagonal elements since it would lead to an 
underestimate of the variance. 

One should be careful in the choice of q used to compute the final Fisher matrix. It is 
not correct to use the recovered c; because they are noisy. Any / for which we recovered a 
low Q would be assigned a spuriously small cosmic variance contribution to its error, and 
hence be given more weight in any subsequent analysis. This would consistently bias any 
fit to the power spectrum in the direction of less power. This effect has been discussed by 
Bond, Jaffe & Knox (1998), and Seljak (1997). We therefore invoke our prior expectation 
that the underlying power spectrum is smooth, and smooth the recovered q with a spline 
prior to forming our final error estimates (see Appendix B for the smoothing procedure). 



of the approximate Fisher matrix F from equation ( ^31) and renormalize to the variances 
obtained from the Monte Carlo simulations F^*-^ i.e. 
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6. Estimating Cosmological Parameters 



If we have a cosmological model that appears to be a good fit to the data, we can use 
the likelihood approach to directly compute cosmological parameters from a map. Rather 
than solve for the power spectrum q, we can solve for a set of parameters, p = {Qb, ^cdm, 
Ho, etc.), that predict the spectrum. 

Proceeding as in §2, we can maximize the likelihood function for p by expanding it as 
a Taylor series around its maximum at p 



, dpk 



{Pk - Pk) {Pk' - Pk' 
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which leads to the analog of equation (|T5| 
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The derivatives of the likelihood function may be computed using the chain rule 
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Similarly, the Fisher matrix of the parameters is given by 
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(50) 



dpk' 



^dpk \ 
- \fWk^"d^- 

The model power spectra may be computed with a fast numerical code (Seljak & 
Zaldarriaga 1996) and the partial derivatives dci/dpk may be computed using a finite 
difference approximation, typically using differences of order 2% of each parameter's value. 
As before, one computes the first and second derivatives of the likelihood function with 
respect to the q (equations (^ and (0)) via Monte Carlo techniques. 

In practice, using Newton-Raphson as a root finding technique in parameter space is 
less straightforward than when solving for the power spectrum q. The radius of convergence, 
the region in which the Taylor expansion is valid, is substantially smaller. Its scale is set 



by (F 



-ni/2 

Ikk 



. In addition, the near degeneracies between various parameters create narrow 
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valleys in likelihood space for which a Newton approach is not optimal. Techniques exist 
which could overcome these difficulties, e.g. the Levenberg-Marquardt method (Press et 
al. 1992), which employs a control parameter A to smoothly modulate between a steepest 
descent method (A ^ 1) and a Newton method (A <C 1) 

Fiv^Fi,{l + XSi,). (51) 

Rather than developing such tools in this paper, we instead explore a simple fit to the 
power spectrum 

X'(P) = E(cKp) - cr^'^'-^'i)F„,(c,(p) - c[f— ^). (52) 
w 

We find that this gives an excellent approximation to maximizing the full likelihood function 
from the map. To see why, consider the probability distribution of cosmological parameters 
given the recovered q 



(53) 



fitting is a maximum likelihood estimator in the limit where F is constant. Since F has 
contributions from the assumed cosmological signal (which is being varied) as well as from 
pixel noise, this is not strictly the case. However, as we demonstrate below, we determine 
the power spectrum with excellent precision at low I. For I > 600, where the confidence 

bands begin to broaden, the signal to noise drops rapidly due to beam smearing. In this 
regime fixed pixel noise is the dominant contribution to the variance of the q. Overall, we 
have excellent knowledge of the Fisher matrix before we begin a fit, so holding F constant 
is an excellent approximation. A consistency check may be performed by comparing the 
power spectrum of the best fit cosmological model to the spline fit of the recovered power 
spectrum. If the two are very close, the Fisher matrices computed from either one will be 
virtually indistinguishable. 



7. Results 

In this section, we apply our numerical techniques to a realistic simulation of the MAP 
data. We simulate two years of W band (94 GHz) data using our current best estimate for 
the detector noise. We apply a Galactic plane cut that excludes the region |6| < 10°, and we 
cut an additional 5% of the sky at random to simulate the effect of excising extragalactic 
point sources. We assume that the noise is azimuthally symmetric in ecliptic coordinates 
and that its variance changes by a factor of two between the ecliptic pole and ecliptic plane. 



- 18 - 



The maps are generated on a grid of galactic longitude and latitude points (/, b) which 
result in smaller map pixels near the galactic poles (see Appendix A). We scale the noise 
per pixel inversely with pixel area, in addition to the intrinsic coverage variations. 

We have obtained results for a 1024 by 2048 pixel map with power up to Imax = 1024, 
using 10 Monte Carlo simulations to compute tr(C~^P') and F. With 10 simulations, 
our expected errors are 5% larger than the minimum variance limit. The entire process 
converged in 5 iterations and took 10 hours of cpu time on an 8 processor SGI Origin 2000 
computer. The recovered power spectrum is shown in Figure ^. Note that some negative 
Q at high / are to be expected. They reflect the fact that the variance in those modes 
was less than that expected from the noise alone. One could adopt a prior distribution to 
prevent the ci from going negative. However, this would complicate the error analysis as 
the probability distribution of the q would become skewed. 

Since we know the true input power spectrum, we can quantify the goodness of our fit 
by computing 

x' = J2SciFi,5ci, (54) 
1,1' 

where 5ci = cp™™''''^ — c^^'^, and F is the Fisher matrix of the input spectrum cf^'^. We 
compute F using equation (|i6|), with 128 Monte Carlo simulations to compute F^;. We 
use a large number of Monte Carlo simulations to ensure that we are comparing our 
results to the true minimum variance Fisher matrix. With N^of = Imax — 1 = 1023, we 
obtain /Nctof = 1.08, in accordance with our expectation that it lie within the range 
(1 + l/Nmc){l ± ^j2/Ndof ) = 1.1 ±0.05 (1 a). We find that 67% of the points lie within the 1 
a error band, and 95% lie within the 2 a band. In Figures | and ^ we plot 6 = Sci/ai, where 
af = (F^^)ii is the variance of each q. The distribution of errors is evidently Gaussian and 
has a mean value (6) = 0.01 and a standard deviation {{6 — (5))^) = 1.04, indicating that 
our results are both unbiased and minimum variance. 

A visually more impressive way to display the power spectrum recovery is to fit a 
smoothing spline to the recovered points. The details of the spline fit are given in Appendix 
B and the results are shown in Figure |^. Note that exploiting the prior that q is a smooth 
function of / allows us to come spectacularly close to the form of the underlying power 
spectrum. As described in Appendix B, we generate confidence regions on the fit by fitting 
splines to 128 Monte Carlo simulations of c; and sorting the fits at each /. The spline 
smoothing parameter has been chosen objectively using a process called cross-validation, 
which is a bootstrap technique (see Appendix B). If different criteria are used to compute 
the smoothing parameter, wiggles may appear in the fit (generally at high /), but it will 
generally stay within the depicted confidence bands. 
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To estimate cosmological parameters, p, we can use the spline fit as our best guess 
power spectrum for computing tlie Fislier matrix F. (Since tlie spline fit power spectrum 
and the input power spectrum are virtually identical, we have simply reused the Fisher 
matrix of the input spectrum we have already computed.) We then minimize 

= E(c/(P) - cr°™^^'^)Fz,(c,(p) - c[r™<^) (55) 
w 

where q(p) is computed for a given parameter set p using CMBFAST (Seljak & Zaldarriaga 
1996). We consider 6 free parameters: Vtb^ ^cdm, h, r, n, and the normalization, and we 
allow for a cosmological constant term, f^A, to enforce a flat universe: + ^cdm + = 1- 
We minimize using an adaptive non-linear least-squares routine in the PORT optimization 
package (Gay 1990) after diagonalizing the Fisher matrix so that may be written as 
a sum of squares. The input model used was standard Cold Dark Matter (sCDM) with 
parameter values 0.1, 0.9, 0.5, 0.0, 1.0, respectively. A starting guess was obtained by 
finding the best-fit model to the spline fit by eye (in fact, the true underlying model was 
unknown to one of us at the time). The minimization routine recovered parameter values 
0.09, 0.76, 0.53, 0.13, and 1.02, respectively. The standard errors on these parameters, as 
given by the parameter Fisher matrix, are 7 x 10"'^, 0.1, 0.02, 0.2, and 0.015, respectively. 
The best fit cosmological model yields x^/^dof = 1-078 with respect to the data, to be 
compared to 1.082 for the input model with respect to the data, both within the 2 a range 
of expected variations for x^- The input power spectrum and the best fit model are plotted 
in Figure they are virtually indistinguishable. 

Note that since the power spectrum of the best-fit model agrees closely with the spline 
fit, we are being self-consistent in holding F constant. If this did not occur, a computation 
that allowed F to vary might be called for. However, to the extent that the spline represents 
a non-parametric estimate of the underlying power spectrum, such lack of agreement might 
indicate deficiencies in the model, which may not have enough degrees of freedom. If one 
cannot obtain a reasonable value of x^, even with additional model parameters, then it 
would be reasonable to rule out that class of cosmological models. 

8. Summary 

We have developed an unbiased, minimum variance estimator for the power spectrum 
of temperature fluctuations in CMB maps. We have used it to estimate the power spectrum 
up to / = 1024 from a 2 million pixel map that simulates single frequency MAP data (94 
GHz) with realistic instrument noise and sky cuts. In contrast with existing algorithms, 
which require 0{N^) operations, our algorithm is 0{N'^), and thus can be run overnight on 
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existing workstations, rather than running for months at national supercomputing facihties. 
We anticipate further improvements in performance with more aggressive optimization (and 
parallehzation) of the code. 

In addition, we have estimated cosmological parameters from the recovered ci and find 
we can recover the input model with good precision. Thus, the pipeline from differential 
time-ordered data to cosmological parameters is now complete and well within present day 
computational capabilities. 

Possibilities for future work include extensions to multi-frequency and polarization 
data, as well as the inclusion of Galactic foregrounds and systematic effects, such as striping, 
in our treatment of the data. The former only incur a linear increase of order a few in the 
operations count. The latter must be dealt with by inclusion of the appropriate new terms 
in the covariance matrix, as well as subsequent simulation of the systematic effects in the 
Monte-Carlo trace computations. 
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A. Fast Spherical Harmonic Transforms 



The algorithm presented in this paper relies heavily on the fact that we have available 
fast, 0(iV^/^), forward and inverse spherical harmonic transforms on the sphere, as opposed 
to standard 0{N'^) transforms. We use a formulation which employs FFTs in 0, and 
thus requires that the map pixels be distributed on rows of constant 9. The scheme was 
first brought to the attention of the CMB community by Muciaccia, Natoli & Vittorio 
(1998). Alternative formulations exist (Dilts 1985) which also use FFTs in the 9 direction. 
The most sophisticated implementation requires only 0(A^(log A^)^) operations for both 
transforms, by using fast Legendre transforms (DriscoU & Healey 1994, Healy, Rockmore & 
Moore, 1996). In practice, due to cache problems in the use of precomputed data, current 
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implementations run no faster than the naive 0{N^^^) algorithms, though work is presently 
underway to remove this barrier (Kostelec 1998). For completeness, we summarize the 
transform method below. 



Making maps - Given a set of a^^'s, we wish to evaluate 

5t{9i,(f)j) = aimYim{Oi,(f)j). 

1=0 m=-l 

Now, observing that we can interchange the order of summation 



(Al) 



we can write 



where 
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l=Q m=-l 
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(A2) 



(A3) 



(A4) 



The Legendre functions may be generated using standard recursion relations. The 
obvious motivation for writing the expansion in this form is to employ FFT techniques in 
the evaluation. 

What is our total operations count? At fixed Oi, we need to generate ^^^'s. (Note 
therefore that our memory storage requirements are 0{N), which is entirely feasible). Since 
there are OiS to step through, the total operations count is I'^ax'^d ~ N^^'^. The cost of 
the FFT at fixed is only O(n0 log n,^). Since there are 0j's to step through, the FFT's 
only cost 0{N\ogN). Thus, to leading order, map making is an 0{N^/'^) process. 



Inverting maps - The formal inverse transform is defined in terms of an integral 



O'lr. 



Yi^{n)dt{n)dn 



which may be evaluated with a cubature formula 

dim = X!^m(fii)'5^(fii)tf^(fii) 



(A5) 



(A6) 



where the integration weight w{D,i) is essentially the solid angle of pixel i. Assuming that 
our pixels are equally spaced in at a given 9, this may be cast in the form 



(A7) 
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with 

21 + 1 (I — mV 

aim = Y.^{ei)qm{ei). : ^i(cosg.) (A8) 

^ \ 47r [l + my. 

where w{6i) is proportional to the pixel solid angle at 6i. 

In practice, we actually wish to evaluate expressions of the form 

lllm = Y.Ylmmfm (A9) 



where f{^) is some function on the sphere. These terms arise in the normal equations, 
and (pSD, where / is an inverse variance weighted temperature map, and in the factorization 
of matrices into convolutions of a diagonal matrix with the spherical harmonics, as in 
equation (^. This expression is distinct from the formal inverse transform, but may be 
evaluated using the same FFT methods by substituting / for 5t and omitting the integration 
weights. 

The reader can easily verify that inverting maps is also an 0{N^^'^) process requiring 
only 0{N) memory storage. One can also exploit various symmetries, such as g_m = 
and P^(cos(7r — 9)) = (— l)'+™P^(cos 6*) to further speed up the transforms. In practice, 
we find the lion's share of cpu time for both the forward and inverse transforms is spent 
in generating the P^. Since we already demand 0{N^^'^) storage for the preconditioner 
matrix, it is only a modest increase to store the P^ in memory, which is also an 0{N^^'^) 
requirement. (Note that only one hemisphere need be stored.) This generally leads to an 
order of magnitude decrease in cpu time: for example, the time required for each / = 512 
transform is reduced from 25 s to 3 s. At present, we find that each I = 1024 transform 
takes about 25s as a single processor job on an SGI Origin 2000. 

We close this Appendix with some comments about the role of pixels in the evaluation 
of inverse spherical harmonic transforms. The cubature formula provides a formal check 
on pixelization schemes by ensuring that pixelization errors are negligible, i.e. that there 
is no information loss in going from a continuous to a discrete field. Formally this implies 
specifying a grid fli and integration weights w{fli)) such that the integration error 

N 

e = Y.w{Q,)f{Q,)- / f{Q)dQ (AlO) 
i=i 

vanishes when / is a polynomial of degree D. In this paper we employ a spherical product 
Gauss cubature formula for integrating over the sphere (see, e.g., Stroud 1971). This scheme 
has the property of requiring the minimum number of pixels for a given polynomial degree 
of any pixelization scheme. The grid points in are given by the zeroes of the Fourier 
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basis sines and cosines (simply the equiangular grid), while the grid points in 6 are given 
by the zeroes of the Legendre polynomial basis. Standard routines for calculating the grid 
points and weights, w{6i), for the associated Legendre quadrature formula exist (Press 
et al., 1992). This grid has negligible integration error and also has the slight advantage 
over a purely equiangular grid that the grid points and weights are a function of the 
integration range. The arrangement of sampled points can therefore be made optimal for a 
sphere with a Galaxy cut. This pixelization scheme does have elongated pixels of smaller 
area near the poles. Thus, since pixel noise scales with pixel area, our scheme contains a 
disproportionately large number of noisy pixels near the poles. In practice, we have not 
found such endpoint apodizing to be harmful. 

What are the optimal choices for uq and in terms of /max? In terms of the 
Gaussian cubature formula, the optimal choice is ne — Imax + '^i ntj, — 2ljnax + 1, which 
integrates exactly the first {Imax + 1)^ spherical harmonics. The accuracy of this grid is 
easily verified numerically by observing the exact orthonormality of the first {Imax + 1)^ 
spherical harmonics on the full sky. (Note that COBE type pixelization schemes, which 
have equal area per pixel, require a substantially larger number of points to achieve the 
same accuracy.) Note that because the number of a/m's is half the number of integration 
points, the spherical harmonic transform is not invertible unless the integrated function is 
bandwidth limited to Imax + 1- Indeed, it has been proven (Taylor 1995) that there does 
not exist any cubature-based discrete spherical harmonic transform with the same number 
of points as spectral coefficients. This highlights an important difference between Fourier 
analysis on and on S"^. Consider any arbitrary 2D function f{9, 0). If mapped onto the 
plane, we can obtain Fourier coefficients f{l,m) with ni = ne, nm = n^, which is invertible. 
On the sphere, however, the spherical harmonics by design only span the space of functions 
invariant under rotations. This implies symmetries between the 9 and (f) directions, which 
in the spherical harmonic basis imply that even though ni = no, nm = n^, half the terms 
are missing as aim = for m > I. This reduction in the number of spectral coefficients 
leads to loss of information (by a factor of 2) for functions which do not respect such 
symmetries, and which thus will no longer be invertible. Note that there do not exist any 
discrete pixelizations of the sphere which are invariant under the full group of rotations (for 
instance, ECP is only azimuthally symmetric), due to the finite number of Platonic solids. 

For practical purposes, the CMB signal is band-width limited, as beam smearing 
virtually destroys any signal above some Imax- However, the noise map is not bandwidth 
limited and cannot be projected onto a finite set of spherical harmonics. We see that by 
noting that the underlying weight map is not rotationally invariant (i.e., not statistically 
isotropic), in general, which is why the noise matrix N(im){imY is dense, even over the full 
sky. This has important practical consequences when maximizing the likelihood function. 
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In particular, one is not free to transform between pixel and spherical harmonic space. 
It is impossible, for instance, to precondition in spherical harmonic space and maximize 
the likelihood in pixel space: the (essentially noise) eigenvalues are wiped out in spherical 
harmonic space but are retained in pixel space. 

We emphasize that our algorithm is independent of the pixelization scheme, as long 
as fast spherical harmonic transforms are available. For instance, the HEALPIX scheme 
(Gorski 1998), which uses equal area pixels, is also a viable option. 

B. Spline Fitting 

We have presented an algorithm for extracting the minimum variance power spectrum, 
Q, and its error matrix, F~^, from a map. This is all that is needed to make comparisons 
with Gaussian theoretical models. However, there are at least two good reasons why 
one would like to fit a smooth curve to the q data. 1) We have argued that it is better 
to construct the experimental Fisher matrix from the smoothed q to avoid bias in any 
subsequent analysis. 2) For the purpose of visual presentation we would like an aid to guide 
the eye though a forest of data points. 

The usual smoothing procedure is to either bin the data or employ a moving average, 
which reduces the errors by y^, where n is the number of points in each bin. The drawback 
of such a procedure is that while the zeroth- and first-order moments of the data are 
preserved, higher order moments are not. In particular, if the underlying function has 
a non-zero second derivative (e.g., at an acoustic peak), a bias is introduced (Press et 
al. 1992). A significant improvement would be to approximate the data as piecewise 
polynomial, rather than piecewise constant, as with Savitzky-Golay smoothing filters (Press 
et al. 1992). 

However, we advocate fitting a least-squares weighted smoothing spline (Wahba 1990) 
to the recovered q. Such a fit is non-parametric and model- independent in the sense that we 
allow ourselves l^ax ~ 1 degrees of freedom. We find the cubic spline f{l) which minimizes 
the quantity 

^(/(o - q)f,,(/(o - qo + a / ni)di. (Bi) 

iv •' 
The smoothing parameter A, which controls the usual trade-off between smoothness and 
fidehty to the data, is chosen to satisfy the relation 

{Ima. - 1) - V2(U, - 1) < < {Imax " 1) + ^2(^0. " 1) (B2) 

where = Y.ii'{f{l) — ci)Fii/{f{l') — q/). The slight freedom we have in selecting A within 
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this range is analogous to the freedom we have to select a bin size in an averaging scheme. 
We proceed with the fit iteratively: first, employing the Fisher matrix of the unsmoothed 
data, we construct a preliminary smoothing spline. Next, generating a new Fisher matrix 
from /(/), we construct a second and final smoothing spline. 

An alternative approach to choosing A is to use cross-validation. This is essentially a 
bootstrap technique in which A is chosen so that the spline best approximates the data at 
any given / if it is computed with all the data except q. In general, the two methods give 
similar results. The fits in this paper were obtained using cross-validation. 

To compute errors on our spline fit, we employ bootstrap resampling. Specifically, we 
calculate the exact Fisher matrix from /(/) and generate a series of synthetic power spectra, 
cf ^ from this Fisher matrix. This will be accurate to the extent that we have accurately 
computed F. One can also generate cf* assuming they are drawn from distributions with 
dispersion (F"^)!/^, which ignores correlations between the q. The two methods give very 
similar results. We then compute the smoothing spline for each realization cf^ and sort the 
results at each /. This allows us to generate confidence intervals for the spline fits. Figure |^ 
shows the customary 68% confidence band. 

The use of a smoothing spline entails a small but non-negligible bias because the mean 
of the Monte Carlo spline fits does not equal the input power spectrum. Such effects are 
well-known (Wahba 1990). We calibrate this bias from the Monte Carlo simulations and 
apply it to our best fit spline and confidence bands. Note also that the confidence bands of 
the spline fit are not symmetric about the mean value, particularly near a peak. The use of 
Monte Carlo simulations gives us the probability distribution of the spline fits and thus a 
firm handle on such effects. 
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Fig. 1.— Normalized Cm vs m, where = \^im\ / (Imax + 1 - m) and W;^ is the 

spherical harmonic expansion of the weight map 1/crf. Note that only the m — 0,2,4,6 
terms are significant, implying that is in fact very sparse. The baseline extending out 
to high m is contributed by the pixels cut due to point sources; it disappears if there are no 
such cuts. 
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Fig. 2. — Plot of A = x^/{lmax — 1) — 1 vs. number of Monte Carlo simulations used to 
compute tr(C~^P'). This result is based on solving for q from a given 512 x 1024 pixel map. 
Our expectation that A oc l/N^c (dashed line) is met. 
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Fig. 3. — Window functions at various /. At high signal to noise, wc observe the well-known 
translational independence of the window function. The small size of the galactic cut allows 
for an extremely narrow window function. At low signal to noise correlations between widely 
separated I appear, but remain very weak. 
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Fig. 4. — Recovered power spectrum q for a single simulated 1024 x 2048 pixel map. 10 
Monte Carlo simulations were used to compute tr(C~^P'). The grey band indicates the 
one sigma uncertainty given by (F;7^)^''^. The light dashed line indicates the starting guess. 
This run converged in 3 iterations using the approximate method and 2 subsequent iterations 
using the Monte Carlo method and took approximately 10 hours as an 8 processor job on an 
SGI Origin 2000 computer. See Figure |^ for an indication of how well the power spectrum 
can be recovered under the assumption that it is a smooth function of /. 
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Fig. 5. — Scatter plot of 5 = (^cpcovered _ ^true^^^^^ where C;''"'^ is the input power spectrum, 
^recovered jg ^^^q recovered spectrum shown in Figure ^, and o"/ = (F"^)!/^. 
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Fig. 6. — Histogram plot of the distribution of 5 = (cp^"'^'""'^'^ — cY^'^)/ai. Also depicted is 
a Gaussian distribution of zero mean and unit variance; the two distributions agree closely. 
67% of the points lie between 5 = —1 and 5 = +1, and 95% of the points lie between S = —2 
and 6 = +2. 
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Fig. 7. — The result of a smoothing sphne fit to the points in Figure ^ The dark sohd hne is 
the sphne fit, the hght sohd hne is the input power spectrum. The grey band indicates the 
68% confidence band obtained from bootstrap simulations. Including the prior expectation 
that the underlying power spectrum is smooth allows one to obtain a very firm handle on 
the power spectrum. 
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Fig. 8. — The solid line indicates the input power spectrum, while the dashed line indicates 
the best-fit cosmological model from the data. The two are virtually identical. 



